

Software Required
=================
	R (version 4.0.5) run on a x86_64, mingw32 PC
	Packages: 
		- data.table (version 1.14.0)
		- readstata13 (version 0.9.2)
		- sas7bdat (version 0.5)
		- bit64 (version 4.0.5)
		- lfe (version 2.8-3)
		- pbapply (version 1.4-3)
		- stargazer (version 5.2.2)
		- ggplot2 (version 3.3.3)
		- tikzDevice (version 0.12.3.1)
		- gmodels (version 2.18.1)
		- survival (verion 3.2-10)
		- readxl (version 1.3.1)
		- fabricatr (version 0.14.0)
		- haven (2.4.3)
		- car (3.0-11)

	SAS

Program Files
=============
	- "Run.sas": This file generates the industry-year measures of earnings quality.  It should be run first on SAS to produce the 
	             earnings quality datasets that are then used in the R script.
	- "JAR Accepted Version.R": This file takes as inputs all of the raw data, processes them and cleanses them, and then runs the 
	                            regressions for the tables and figures in the main paper.  


Data Files
==========
Publicly available data files are in the Public Data Files folder.  The sources for these are listed in the data description section 
below.  These include any data that are freely available from a government source.  Freely available data used in this study that 
can be downloaded from non-governmental sources (e.g. BACI data, import data on Peter Schott's website) are not included in the zip 
archive but can be downloaded from the sources listed in the data description below.  Proprietary data sources require a subscription 
for download.  Identifiers for the samples have been included so that those with a subscription looking to replicate the code can 
back into the same sample.  

Identifiers Lists
=================
	- CompustatIdentifiers.csv: Compustat identifiers used to create the public firm presence measure
	- CompustatSegmentIdentifiers.csv: Compustat segment identifiers used to adjust the public firm presence measure for multisegment firms
	- CombinedDataIdentifiers.csv: Year and industry identifiers of the main combined sample from the import data, ASM, and Compustat
	- BACIIdentifiers.csv: Year-industry-country pair identifiers for the BACI trade data.  
	- SoxIdentifiers.csv: PERMNOs of stocks used for the SOX event study tests
	- GermanIdentifiers.csv: BvD identifiers for German disclosure analysis
	- GuidanceDataIdentifiers.csv: Company guidance sample identifiers
	- AnalystIdentifiers.csv: Analyst forecast sample identifiers
	- UKIdentifiers.csv: BvD identifiers for UK falsification test

Data Description 
================
	- US import data from the US Census Bureau Data was downloaded from Peter Schott’s website in June 2018.  
	  These data were used to create our ImportComp variable.  The data can be accessed at the following URL: 
	  https://www.dropbox.com/sh/7r0gj1dhl6qkww9/AABEbzavXNXXIYoOh7a3LJ4ga?dl=0
	- US Annual Survey of Manufactures and Census of Manufactures information were downloaded from the Census Bureau’s website in June 2018.  
		- The following datasets were downloaded from the Census Bureau’s FTP server (www2.census.gov) for the following years:
			- 2002: ec0231sg102.dat
			- 2003: am0331gs102.dat, am0331gs105.dat, am0331gs106.dat
			- 2004: am0431gs102.dat, am0431gs105.dat, am0431gs106.dat
			- 2005: am0531gs102.dat, am0531gs105.dat, am0531gs106.dat
			- 2006: am0631gs101.dat
			- 2007: ec0731sg2.dat
			- 2008: am0831gs101.dat
			- 2009: am0931gs101.dat
			- 2010: am1031gs101.dat
			- 2011: am1131gs101.dat
			- 2012: ec1231sg1.dat
			- 2013: am1331gs101.dat
			- 2014: am1431gs101.dat
			- 2015: am1531gs101.dat
			- 2016: am1631gs101.dat
		- For 2001 and prior, data were obtained by scraping PDFs of the Census publication “Statistics of Industry Groups and Industries: 2001.”  
		  We downloaded the document in June 2018 and it can be accessed at the following URL: https://www.census.gov/prod/2003pubs/m01as-1.pdf
	- For our public presence variables we use Compustat annual fundamentals data (last downloaded in August 2018) and segment files (last 
	  downloaded in October 2019).  These data were obtained from the WRDS web interface.  
	- Industry concentration data is obtained from the US Census Bureau’s “American FactFinder” website in the following datasets.  Each was 
	  downloaded in December 2018.
		- EC0231SR12
		- EC0731SR12
		- EC1231SR2
	- Related-party trade data were downloaded from the Census Bureau in August 2018:
		- 2005-2016 data were obtained from this website: relatedparty.ftd.census.gov
		- 2000-2004 data were downloaded from the website https://www.census.gov/foreign-trade/Press-Release/related_party/index.html.  
		  Follow the links for years 2000-2004 and download exhibit 4 from each.
	- SOXBHAR data was constructed from the WRDS event study application in September 2018.  We use an event period of 12 trading days starting
	  with 7/8/2002.  We estimate abnormal returns as the residual from a model of expected returns based on the Fama-French and momentum 
	  factors. We estimate firms’ factor exposures using firm returns over the 100-day window (requiring at least 70 return observations per 
	  firm) prior to a 50-day gap before the event
	- For matching industries across datasets we use a variety of concordances:
		- NAICS descriptions are obtained from https://www.census.gov/foreign-trade/reference/codes/naics/naicsmst.txt in August 2019 to 
		  develop a cross-reference for a few Census-specific rollup industries to 4-Digit NAICS codes
		- HS-NAICS concordance cones from Peter Schott’s website and were downloaded in October 2019 (accessible at 
		  https://www.dropbox.com/s/0yk03fmitdphhf1/hssicnaics_20181015.zip?dl=0)
		- NACE-NAICS concordance downloaded from https://ec.europa.eu/eurostat/ramon/relations/index.cfm?TargetUrl=LST_REL&StrLanguageCode=EN&IntCurrentPage=11  
		  in November 2019.  
	- NTR Gap information is from the replication files of “The Surprisingly Swift Decline of US Manufacturing Employment” Pierce and 
	  Schott (AER 2016), downloaded in June 2018.  Accessible from https://www.aeaweb.org/aer/data/10607/20131578_data.zip 
	- Import data for non-US countries was obtained from the BACI database downloaded in October 2019 (accessible at 
	  http://www.cepii.fr/cepii/en/bdd_modele/presentation.asp?id=1)  
	- Data for the construction of ICscores is from CRSP (returns and trade volume), Compustat (earnings, industry, earnings release dates, 
	  and shares outstanding), and EDGAR (filing dates). 
	- EDGAR download data was obtained from the EDGAR logfiles on the SEC’s website (https://www.sec.gov/dera/data/edgar-log-file-data-set.html) 
	  during April 2019.
	- IP location data was obtained from lite.ip2location.com/database/ip-country during April 2019.   
	- Bureau van Dijk’s Orbis data are used for our German and UK analyses.  We obtain these data from University of Michigan’s June 2019 
	  historical data snapshot.  Specifically, we use the following files:
		- Key_financials-USD.txt
		- Industry_classifications.txt
		- Legal_info.txt 

